merge from upstream #45

l3utterfly · 2024-11-16T06:43:07Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

…0177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <[email protected]>

* metal : add quantized FA (vec) support ggml-ci * metal : add quantized FA (non-vec) support * metal : fix support check ggml-ci * metal : clean-up * metal : clean-up (cont) * metal : fix shared memory calc + reduce smem + comments * metal : float-correctness * metal : minor [no ci]

ggml-ci

* ggml : add initial BF16 support ggml-ci * metal : add mul_mat_id BF16 support ggml-ci * metal : check for bfloat support on the Metal device ggml-ci * metal : better var names [no ci] * metal : do not build bfloat kernels when not supported ggml-ci * metal : try to fix BF16 support check ggml-ci * metal : this should correctly check bfloat support

…eleration (ggerganov#10133) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Plamen Minev <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: Meng, Hengyu <[email protected]>

Co-authored-by: EC2 Default User <[email protected]>

* server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)

* llama.swift : exclude ggml-metal-embed.metal * swift : exclude build/

* ggml : add ggml_flash_attn_ext_get_prec * metal : use F16 precision in FA kernels ggml-ci * metal : minor clean-up * metal : compile-guard bf16 FA kernels ggml-ci * build : remove obsolete compile flag [no ci] * metal : prevent int overflows [no ci] * cuda : disable BF16 FA ggml-ci * metal : fix BF16 requirement for FA kernels ggml-ci * make : clean-up [no ci]

* metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci

…ganov#10156) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for FP32 datatype. This change results in a consistent 90% improvement in input processing time, and 20% to 80% improvement in output processing time, across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <[email protected]>

…ator when ‘ne’ is small (ggerganov#10213)

* metal : reorder write loop * metal : int -> short, style ggml-ci

…rganov#10226)

* Add back samplers to server * Added tooltips with basic information * Fixed stretching of input fields. * use component for settings input, move help msg to tooltips --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

Signed-off-by: Xiaodong Ye <[email protected]>

* use 128 bit loads (i've tried 256->128 to death and its slower) * double accumulator * avx bf16 vec dot * +3% q4_0 inference * +7% tg +5% pp compared to master * slower f16c version, kep for reference * 256b version, also slow. i tried :) * revert f16 * faster with madd * split to functions * Q8_0 and IQ4_NL, 5-7% faster * fix potential overflow (performance reduced) * 16 bit add for q4_0 only * merge

ggml-ci

…ags (ggerganov#10314)

…nov#10286) fixes ggerganov#10285

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.

gabe-l-hart and others added 30 commits November 5, 2024 14:23

llama : add <|tool_call|> formatting to Granite template (ggerganov#1…

b8deef0

…0177) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <[email protected]>

ggml : adjust is_first_call init value (ggerganov#10193)

1dc04b2

ggml-ci

metal : fix from ptr buffer name (ggerganov#10189)

94d8cb8

server : remove hack for extra parallel slot (ggerganov#10187)

b11f9ba

ggml-ci

fix q4_0_8_8 format for corrupted tokens issue (ggerganov#10198)

2319126

Co-authored-by: EC2 Default User <[email protected]>

DRY: Fixes clone functionality (ggerganov#10192)

5107e8c

Remove identical wte/etw logic for jais (ggerganov#10203)

60e17ce

ggml : add ggml-cpu.h to the public headers (ggerganov#10204)

97404c4

scripts : sync update

a2c6fd7

sync : ggml

3b08828

scripts : add amx to sync-ggml.sh [no ci]

eec4d71

server : minor UI fix (ggerganov#10207)

76c6e7f

swift : exclude ggml-metal-embed.metal (ggerganov#10211)

d05b312

* llama.swift : exclude ggml-metal-embed.metal * swift : exclude build/

metal : improve clarity (minor) (ggerganov#10171)

695ad75

metal : opt-in compile flag for BF16 (ggerganov#10218)

ec450d3

* metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci

scripts : fix pattern and get n_tokens in one go (ggerganov#10221)

8fc393f

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL oper…

5b359bb

…ator when ‘ne’ is small (ggerganov#10213)

metal : hide debug messages from normal log

46323fa

llama : fix Qwen model type strings

f018acb

metal : fix F32 accumulation in FA vec kernel (ggerganov#10232)

bb38cdd

metal : fix build and some more comments (ggerganov#10229)

39a334a

metal : reorder write loop in mul mat kernel + style (ggerganov#10231)

6423c65

* metal : reorder write loop * metal : int -> short, style ggml-ci

vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (gge…

160687b

…rganov#10226)

Rbiessy and others added 14 commits November 15, 2024 13:10

sycl: Update Intel docker images to use DPC++ 2025.0 (ggerganov#10305)

57f8355

ci: build test musa with cmake (ggerganov#10298)

f0204a0

Signed-off-by: Xiaodong Ye <[email protected]>

sync : ggml

cbf5541

ggml : vulkan logs (whisper/2547)

3225008

cmake : fix ppc64 check (whisper/0)

09ecbcb

ggml-ci

ggml : fix some build issues

883d206

scripts: update compare-llama-bench.py (ggerganov#10319)

4047be7

Make updates to fix issues with clang-cl builds while using AVX512 fl…

74d73dc

…ags (ggerganov#10314)

llama : save number of parameters and the size in llama_model (ggerga…

89e4caa

…nov#10286) fixes ggerganov#10285

ggml : optimize Q4_0 into Q4_0_X_Y repack (ggerganov#10324)

1e58ee1

vulkan : add cmake preset debug/release (ggerganov#10306)

dd3a6ce

Merge branch 'layla-build' into merge

bce287c

l3utterfly merged commit b68eda1 into layla-build Nov 16, 2024
7 of 9 checks passed

l3utterfly deleted the merge branch November 16, 2024 06:50

github-actions bot added documentation Improvements or additions to documentation SYCL Nvidia GPU testing build examples devops python server ggml Kompute Apple Metal script nix labels Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge from upstream #45

merge from upstream #45

l3utterfly commented Nov 16, 2024 •

edited

Loading

merge from upstream #45

merge from upstream #45

Conversation

l3utterfly commented Nov 16, 2024 • edited Loading

l3utterfly commented Nov 16, 2024 •

edited

Loading